Towards Basic Categories for Describing Properties of Texts in a Corpus

نویسنده

  • Serge Sharoff
چکیده

The paper discusses the basic principles for describing properties of texts to be stored in a corpus and suggests the standard that is used in the majority of corpora developed at the University of Leeds and can be potentially employed for describing texts in any corpus collecting activity. The standard defines the minimal subset of tags and attributes that are necessary for describing texts stored in a corpus. The proposed text typology helps to position a corpus under development with respect to a reference corpus covering all possible features by explicit selection of a subset of features to be considered in the study.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Web Corpus Statistics to Infer Conceptual Structure

The basic level is the level of conceptual structure at which categories are maximally informative. In this research, we investigated whether the privileged status of the basic level might be captured by the statistical properties of the Web. Using Google’s Web search programming interface, we found that frequency ratios for terms across three levels of abstraction (superordinate, basic, and su...

متن کامل

Towards a reference corpus of web genres

Genres of spoken and written texts are being intensively studied from various angles, e.g., communication studies, discourse analysis, computational linguistics, without arriving at a generally accepted definition. Many corpora have been built to represent the language, but very few large corpora indicate genres, and when they do the typology of genres varies widely. For instance, the Brown cor...

متن کامل

Vocabulary Lists for EAP and Conversation Students

Despite the abundance of research investigating general and academic vocabularies and developing dozens of word lists, few studies have compared academic vocabulary with general service word lists such as conversation vocabulary. Many EAP researchers assume that university students need to know all the words in West’s (1953) General Service List (GSL) as a prerequisite to academic words (e.g., ...

متن کامل

Move-based investigation of appraisal in the introduction section of Applied Linguistics research articles: Similarities and differences between L1 and L2 English texts

Recent research has shown that academic writing is not ‘author-evacuated’ but, rather, carries a representation of the writers’ identity. One way through which writers project their identity in academic writing is stance-taking toward propositions advanced in the text. Appropriate stance-taking has proved to be challenging for novice writers of Research Articles (RAs), especially those writing ...

متن کامل

The System of Engagement in a Sample of Prose Fiction and the News

Emerging within Systemic Linguistics, Appraisal/Evaluation is a framework for analyzing the language of evaluation, providing techniques for the systematic analysis of evaluation and stance as they operate in whole texts and in groupings of texts. There are three systems in the Appraisal framework: Attitude, Engagement, and Graduation. This study sets out to analyze the use of the system of Eng...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004